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1.  Introduction 


One  of  the  hardest  problems  in  all  of  physical  oceanography  is  real-time  La- 
grangian  prediction.  Accurate,  reliable,  real-time  Lagrangian  prediction  of  the 
movement  of  overboard  people,  distressed  ships,  downed  aircraft,  drifting  icebergs, 
pollutants,  etc.  in  ocean  surface  waters  is  crucial  for  U.S.  Coast  Guard  (USCG) 
activities  related  to  Search  and  Rescue  (SAR),  International  Ice  Patrol,  and  envi¬ 
ronmental  protection.  In  particular,  SAR  missions  are  global  and  require  global 
environmental  data  sets  for  planning  optimal  search  strategies. 

Accurate  and  contemporaneous  estimates  of  environmental  variables  such  as 
wave  height,  wind  speeds,  and,  most  importantly,  sea  surface  velocity  are  required 
to  achieve  the  highest  probabiUty  of  a  successful  rescue.  However,  in  many  sit¬ 
uations  such  data  are  not  available  and  historical  estimates  and  their  associated 
variances  are  used  in  computer  models  for  identifying  regions  of  highest  probabihty 
of  containment.  These  computer  models  require  historical  sea  surface  velocity  and 
their  variances  on  a  regular  grid. 

Two  recent  studies  (Perkins  and  Marsee,  1994;  Viekman  and  Jordan,  1991) 
in  the  Florida  Current  region.  Gulf  Stream,  and  Cahfornia  Current  region  have 
documented  serious  deficiencies  in  the  present  historical  data  base  used  in  SAR 
operations.  The  major  problems  are  that  (1)  the  surface  velocities  in  the  present 
operational  data  base  are  too  slow  in  the  major  currents  analyzed  and  (2)  the 
velocity  fields  are  too  smooth.  For  instance,  Viekman  and  Jordan  (1991)  calculated 
a  40-90  cm/s  difference  in  zonal  and  meridional  velocity  components  between  the 
historical  data  base  and  surface  drifters.  Both  studies  recommended  that  a  new 
historical  data  base  be  constructed,  for  USCG  SAR  operational  use,  from  ship  drift 
data  archived  by  the  U.S.  Navy.  This  report  details  the  construction  of  a  new  SAR 
historical  data  base  from  the  Maury  Ship  Drift  Data. 

The  largest  historical  data  base  of  sea  surface  velocity  is  the  Maury  Ship  Drift 
Data  (MSDD)  compiled  by  the  U.S.  Naval  Oceanographic  Office  (NAVOCEANO). 
These  data,  described  in  Section  2,  (1)  have  an  irregular  distribution  in  space  and 
time;  (2)  have  large  measurement  errors;  and  (3)  contain  extreme  unreahstic  data 
outhers  that  must  be  removed.  Section  3  contains  a  description  of  a  generahzed 
median  filter  for  removing  data  outliers  and  an  objective  analysis  (OA)  technique 
that  was  used  for  space/time  interpolation  of  the  irregularly  distributed  data  onto 
a  regular  grid.  Examples  of  the  analysis  velocity  fields,  and  the  seasonal  and  annual 
averages  of  the  sea  surface  velocity  analysis  are  presented  in  Section  4.  The  limita¬ 
tions  of  the  MSDD  and  our  analysis  methodology  and  recommendations  for  future 
improvement  of  sea  surface  velocity  estimates  and  applications  for  SAR  activities 
are  discussed  in  Section  5. 
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2.  Maury  Ship  Drift  Data 

The  sea  surface  velocity  data  base  used  here  consists  of  historical  ship  drift 
measurements  obtained  from  NAVOCEANO.  This  data  base,  known  as  the  Maury 
Ship  Drift  Data,  is  named  after  Lt.  Matthew  Fontaine  Maury  who,  during  the 
years  1842-1861,  inaugurated  the  tabulation  of  ship  drift  from  historical  merchant 
logbooks  and  a  hydrographic  reporting  program  among  shipmasters  (Bowditch, 
1984).  This  information  has  been  used  to  compile  pilot  charts  under  the  auspices 
of  NAVOCEANO  (1955,  1966,  1979a,  1979b,  1979c). 

Ship  drift  is  computed  as  the  vector  difference  between  dead-reckoning  position 
and  the  actual  position  over  a  12-24  hour  period.  Averaging  over  24  hours  practi¬ 
cally  ehminates  tides,  inertial  motion,  and  small-scale  ocean  variabihty,  and  yields 
an  estimate  that  resolves  oceanic  scales  on  the  order  of  a  few  hundred  kilometers 
(Wyrtki  et  ai,  1976;  Richardson  and  McKee,  1984). 

Strictly  speaking,  drift  velocities  differ  from  true  sea  surface  velocities  mainly 
because  of  windage  effects  on  the  ships.  The  differences  between  these  two  quantities 
are  detailed  in  Section  3.2  when  errors  are  assigned  to  the  ship  drift  measurements. 
Nevertheless,  a  number  of  oceanographic  studies  have  used  this  data  set  to  estimate 
surface  mean  and  eddy  kinetic  energy  levels  (Wyrtki  et  al,  1976;  Richardson  and 
McKee,  1984),  to  study  seasonal  circulation  patterns  (Stidd,  1975;  Meehl,  1982; 
Richardson  and  Reverdin,  1987),  and  to  evaluate  numerical  models  (Richardson 
and  Philander,  1987;  Arnault,  1987).  These  studies  and  the  comparison  between 
ship  drift,  buoys,  and  current  meters  by  McPhaden  et  al.  (1991)  clearly  demonstrate 
that  ship  drift-based  velocity  estimates  are  a  surprisingly  good  indicator  of  mean 
sea  surface  velocities  and  their  seasonal  variations. 

The  data  base  used  here,  acquired  from  NAVOCEANO  by  USCG  Research  and 
Development  Center  (R&D  Ctr),  was  last  updated  November  1991.  The  number 
of  observations  is  4.16  milhon  ship  drift  estimates.  There  may  be  up  to  0.7  milhon 
missing  observations  from  this  data  base  because  of  probable  accidental  deletion 
during  file  transfers  (Richardson,  1989).  Most  of  the  missing  observations  (63  per¬ 
cent)  are  probably  in  the  data-sparse  South  Pacific.  Finding  the  missing  data  and 
adding  the  new  data,  especially  surface  drifters  (see  Section  5),  should  be  given  high 
priority  by  the  USCG  R&D  Ctr. 

Data  from  1900  to  1991  are  used  in  this  analysis.  Navigation  errors  were 
significantly  reduced  during  the  early  part  this  century  through  a  combination  of 
improved  optics  in  the  sextants  and  more  stable  chronometers.  Comprehensive 
statistics  for  this  data  base  are  given  in  Richardson  (1989)  and  references  therein. 
Table  1  presents  the  number  of  edited  observations  for  each  month  for  each  of  three 
major  basins. 
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Table  1.  Number  of  Edited  Ship-Drift  Observations  for  Three  Major  Basins 


Atlantic 

Indian 

Pacific 

January 

147077 

48307 

70596 

February 

99752 

38351 

54918 

March 

165768 

48008 

109170 

April 

109285 

37518 

70534 

May 

182938 

42725 

94765 

June 

88293 

41457 

65195 

July 

163220 

41640 

54801 

August 

174074 

42551 

57113 

September 

164577 

41942 

54276 

October 

120494 

39441 

54510 

N  ov  ember 

149695 

38689 

84438 

December 

77015 

40980 

49562 

Note  in  Table  1  the  low  number  of  observations  for  June  and  December  for  the 
Atlantic  Ocean.  Richardson  (1989)  also  noted  this,  and  argued  for  probable  deletion 
or  omission  of  the  data  file  in  the  North  Atlantic.  In  addition,  Richardson  noted 
in  the  Pacific  data  that  there  were  a  small  number  of  observations  for  the  months 
of  February,  April,  June-October,  and  December  for  the  North  Pacific  and  that 
only  three  relatively  data-rich  months,  March,  April,  and  June,  are  available  in  the 
South  Pacific. 

A  monthly  temporal  resolution  was  chosen  for  our  estimates.  This  frequency 
allows  for  adequate  resolution  of  the  important  semi-annual  and  annual  compo¬ 
nents  of  the  flow  field  and  barely  resolves  the  circulation  features  associated  with 
the  energetic  50-60  day  wind  events.  The  important  1-10  day  weather  scale  was 
not  resolved  and  should  be  modeled  as  the  random  component  in  the  trajectory 
simulations.  Higher  temporal  resolution  would  significantly  lower  the  confidence 
in  the  estimates  and  would  create  many  data- void  gaps,  especially  in  the  southern 
oceans.  Instead  of  the  twelve  monthly  estimates,  only  one  annual  estimate  was 
made  south  of  50°  S  because  of  data  sparsity.  In  general,  the  North  Atlantic  has 
the  highest  data  density  and,  consequently,  the  most  rehable  estimates,  and  the 
South  Pacific  has  the  lowest  data  density  and  the  least  rehable  estimates. 

The  monthly  data  sets  were  also  binned  into  twelve  basins  for  the  analysis. 
This  binned  data  set  is  known  as  the  Monthly  MSDD  (MMSDD).  This  was  done  for 
two  primary  reasons;  (1)  to  make  the  computations  more  manageable  so  that  they 
could  be  done  on  a  workstation;  and  (2)  to  prevent  artificial  leaking  of  information 
during  the  interpolation  procedure  over  land  masses.  For  example,  given  the  300  km 
correlation  scales  used  (see  Section  3.2),  Caribbean  Sea  velocities  would  be  used  in 
the  estimation  of  Pacific  velocities  in  the  region  surrounding  the  Isthmus  of  Panama, 
and  vice  versa  if  the  data  were  not  binned  into  separate  files.  Also,  velocity  data 
from  coastal  seas,  with  much  different  dynamics  and  smaller  decorrelation  scales, 
would  not  influence  the  estimates  in  nearby  oceanic  regimes. 
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The  twelve  basins  are  the;  Antarctic  Ocean,  Atlantic  Ocean,  Baltic  Sea,  Black 
Sea,  Caribbean  Sea,  South/East  China  Sea,  Indian  Ocean,  Sea  of  Japan,  Mediter¬ 
ranean  Sea,  Pacific  Ocean,  Persian  Gulf,  and  the  Red  Sea.  Data  overlap  regions  at 
the  boundairies  connecting  the  major  basins,  such  as  the  region  south  of  the  Cape 
of  Good  Horn,  are  needed  to  ensure  that  the  final  field  estimates  blend  together 
smoothly.  Data  overlap  regions  are  on  the  order  of  (1000  km)^  for  the  OA  calcu¬ 
lations.  This  area  is  greater  than  three  times  the  correlation  length  scales  in  each 
direction  (west-east,  south-north)  so  that  bigger  overlap  regions  would  not  add  any 
new  information  to  the  estimates  at  the  “seams”  where  basins  meet.  A  comparison 
of  the  estimates  at  the  basin  seams  is  discussed  in  Section  4. 
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3.  Methodology 


As  mentioned  earlier,  the  MSDD  must  be  edited  to  remove  outliers  and  then 
space-time  interpolated  onto  a  regular  grid  in  space  and  time  for  USCG  operational 
use.  The  spatial  interpolation  grid  was  chosen  by  USCG  R&D  Ctr  and  is  currently 
the  grid  in  operational  use.  Monthly  estimates  for  all  basins,  except  Antarctic,  at 
these  grid  points  were  determined  to  be  sufficient  for  present  operational  use.  The 
next  two  subsections  discuss  the  generadized  median  filtering  algorithm  used  for 
data  editing  (Section  3.1)  and  the  parameter  matrix  objective  analysis  algorithm  of 
Mariano  and  Brown  (1992),  hereafter  MB92,  used  for  data  interpolation  (Section 
3.2). 


3.1  Generalized  Median  Filter 

An  area  encompassing  the  Gulf  Stream  off  the  coast  of  Florida  was  chosen 
for  initial  testing  because  it  contained  both  the  well-defined  Florida  current  and 
because  the  SAR  historical  data  base  was  unrealistic  in  this  very  important  area 
(Perkins  and  Marsee,  1994).  Fig.  la  is  a  representative  OA  map  of  surface  velocities 
using  all  the  May  data  for  this  area.  It  is  quite  evident  that  some  of  the  velocity 
estimates  are  poor  even  after  smoothing  by  the  OA  scheme  (though  for  this  map, 
the  smoothing  was  kept  to  a  minimum).  Inspection  of  the  input  data  revealed 
extreme  data  outhers.  The  presence  of  outhers  was  expected  with  the  MSDD  based 
on  previous  work  by  Richardson  and  colleagues.  Fig.  lb  shows  the  OA  map  after 
the  outhers  were  removed  using  a  simple  procedure  that  calculated  the  arithmetic 
mean  and  the  standard  deviations  of  u  and  u,  and  that  removed  every  data  point 
that  was  more  than  three  standard  deviations  from  the  mean.  A  technique  for 
removing  outhers  from  the  basin-wide  data  subsets  was  formulated,  based  on  the 
encouraging  results  of  this  initial  mapping  experiment. 

Most  methods  for  removing  outhers  require  a  measure  of  central  tendency, 
Q,  such  as  the  arithmetic  mean  or  median  value,  and  a  measure  of  variation  or 
spread,  such  as  the  standard  deviation,  ctq.  An  envelope  of  acceptable  data  values, 
(Qi)Q2))  is  constructed  by  calculating 


(3.1a) 
(3.16) 
In  many 

Qi  <Qi<  Q2, 

then  Qi  is  deemed  a  good  data  point  and  is  kept  for  analysis.  Otherwise,  it  falls 
outside  the  acceptable  data  value  range  and  is  removed  from  the  analysis. 

In  many  oceanographic  apphcations,  including  this  one,  there  are  a  number  of 
practical  difficulties  in  using  this  simple  formula.  These  difficulties  are  encountered 


Qi  =  Q- 
Q2  =  Q  +  aaq, 

where  a  is  a  factor  controUing  the  width  of  the  acceptable  data  envelope, 
apphcations,  a  is  2-3.  If  the  data  value  Qi  is  between  Qi  and  Qz^  viz. 
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because:  (dl)  the  ocean  is  energetic  over  a  wide  spectrum  of  scales  and  its  variability 
is  greater  than  the  mean  in  most  places;  (d2)  the  ocean  is  heterogeneous-the  mean 
flow  in  the  subtropical  gyre  is  a  few  cm/s  while  in  the  Gulf  Stream,  the  flow  is 
100-250  cm/s;  (d3)  the  variation  in  a  is  also  large,  c.5.,  the  surface  velocity  Eddy 
Kinetic  Energy  (EKE)  maps  of  Wyrtki  ti  al.  (1976)  vary  by  a  factor  of  six  to  seven; 
and  (d4)  anomalous  ocean  events,  order  of  ten  standard  deviations  from  the  mean, 
occur  (McWilhams  et  a/.,  1983). 

If  the  measurement  error  were  small,  a  large  anomaly  would  be  real  if  mea¬ 
sured,  and  difficulty  (d4)  would  vanish.  However,  the  measurement  error  for  the 
MMSDD  is  large  and  the  necessity  of  removing  outhers  was  demonstrated  in  Section 
2.  Thus  we  must  live  with  (d4)  and  an  algorithm  deahng  with  difficulties  (dl-d3) 
was  constructed.  This  algorithm  flags  good  data  as  bad  data  for  extreme  events, 
but  this  is  unavoidable  with  this  data  set. 

In  fight  of  (dl-d3),  the  basin-wide  application  of  eqn.  (3.1)  requires  Q{x,y) 
and  (rQ{x,y)  for  Q  =  u,v.  Note  that  the  temporal  dependency  in  these  terms  is 
not  needed  since  the  data  have  been  binned  into  months.  In  general,  a  could  also 
be  a  function  of  x  and  y,  but  this  was  not  tried.  Empirical  determination  of  a  is 
discussed  below. 

The  MMSDD  was  binned  into  5°  x  5®  bins  for  outlier  removal  and  OA  calcu¬ 
lations.  The  selection  of  bin  size  was  a  compromise  between  a  small  bin  size  that 
would  better  resolve  the  field  and  a  large  bin  size  so  that  Q(x,y)  and  <t(®,2/)  are 
estimated  from  a  large  sample  size.  For  each  bin  k,  the  standard  deviation  of  u  and 
V,  denoted  by  au  and  Cv,  respectively,  are  estimated  using  the  standard  formulas 


(n-1)  ’  ' 
(n-l)  ^  ’ 


where. 


Q  =  '^Qiln, 

for  Q  =  Uk  and  Vk-  <Tu  and  (r„  are  assigned  to  the  midpoint  of  the  bin.  A 
smooth  bicubic  spline  (Inoue,  1986;  MB92)  is  fitted  to  all  assigned  (Tu  values  and 
all  assigned  values  generating  two  sets  of  spline  coefficients,  one  for  one  for 
a-u,  for  each  of  the  twelve  basins  and  for  each  of  the  twelve  months. 

The  outlier  procedure  is  described  next  for  one  basin  and  one  month.  For  each 
5°  X  5°  bin  k,  Uk  and  Vk  were  estimated  using  a  median  filter  since  this  measure 
of  central  tendency  is  better  suited  than  the  arithmetic  average  for  data  sets  with 
extreme  outliers.  Let  the  position  of  the  data  point  be  denoted  by  (3Ci>2/i). 

and  crt,(®i,2/i)  are  ccilculated  by  evaluating  the  bicubic  spline  for  au  and 
respectively,  at  each  position  (®i,2/t).  Uk  and  Vk  for  each  of  the  k  bins  are 
calculated  by  the  taking  the  median  value,  defined  by  ordering  the  data  values. 
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Ql  ^  Qz  ^  ^  Qj  ^  —  Qn—l  ^  Qm 


and  defining  if  n  is  odd 


j  =  j  +  1  =  (n  +  l)/2, 


and  if  n  is  even 


j  =  n/2,  ji  +  1  =  (n  +  2)/2. 

The  median  value,  Q,  is  then  defined  as 

Q  =  +  Qj+i)- 

Let  Si  denote  the  speed  associated  with  the  data  point,  Si  =  Let  Sk 

denote  the  speed  associated  with  the  median  values  of  Uk  and  Vk,  ^k  = 

Let  O',  =  +  0^2  represent  the  standard  deviation  of  the  field  evaluated  at  the 

point  (®i,2/i).  If 


Sk  -  aas{xiiyi)  <  Si  <  Sk  +  ao^5(a:i,l/0) 

is  true  then  the  data  point  (ui,Vi)  is  deemed  good  and  is  used  in  the  analysis.  It  is 
removed  from  the  analysis  if  it  fails  this  condition. 

The  value  for  o  was  determined  empirically.  If  a  is  too  small,  too  many  good 
data  points  are  removed.  If  o  is  too  large,  too  many  bad  data  points  are  retained  for 
analysis.  For  example,  if  the  velocity  components  were  truly  Gaussian  distributed, 
then  a  value  of  a  «  2  (3)  would  remove  5%  (1%)  of  the  good  data  points.  A  range 
of  a  from  1. 5-4.0  was  tested  by  viewing  the  fields  and  noting  that  the  largest  o,  value 
contained  no  obvious  outhers.  The  best  value  of  u,  for  this  data  set,  was  determined 
to  be  3. 


3.2  Objective  Analysis 

Objective  Analysis,  hereafter  OA,  is  used  here  for  interpolating  irregularly 
spaced  asynoptic  observations  onto  a  regular  grid  at  a  common  estimation  time 
for  subsequent  analysis.  OA  has  been  used  extensively  for  mapping  oceanic  fields 
(e.^.,  Bretherton  et  a/.,  1976j  Freeland  and  Gould,  1976j  Carter,  1983)  Davis,  1985, 
McWilliams  et  al,  1986;  Carter  and  Robinson,  1987;  Robinson  et  al,  1987,  Watts 
et  ai,  1989;  and  Mariano  and  Brown,  1992).  OA  has  its  roots  in  the  analysis  of 
meteorological  variables  {e.g.,  Gandin,  1963;  Thiebaux,  1974;  Ghil  et  al.,  1981; 
Thiebaux  and  Pedder,  1987;  Daley,  1991).  In  general,  an  OA  of  data  requires  (1) 
estimates  of  the  covariance  (correlation)  function  of  the  variable  of  interest,  (2)  the 
covariance  function  of  the  data  variable,  and  (3)  the  cross-covariance  function  be¬ 
tween  the  variable  of  interest  and  the  data.  A  brief  introduction  to  the  theory  of  OA 
is  presented  next  in  terms  of  a  scalar  variable  T  that  represents  temperature  since 
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the  software  was  developed  for  the  estimation  of  global  Sea  Surface  Temperature 
(SST).  Velocity  components  u  and  v  are  mapped  independently  by  the  following 
method  outhned  for  T. 

Since  a  perfectly  accurate  temperature  can  never  be  measured,  the  i*  mea¬ 
surement  Ti  is  denoted  as 

Ti  =  Ti  +  Cj, 

where  ti  is  the  measurement  error,  (cj)  is  the  average  error  at  the  location. 
Our  goal  is  to  find  the  best  estimator  of  the  form 

To  =  aifi  =  ^  ai{Ti  +  e^.  (3  2) 

i=l  i=l 

that  takes  into  consideration  both  error  and  distance  information.  The  usual  choice 
for  the  criterion  of  merit  for  the  notion  of  best  is  a  minimum  variance  unbiased 
estimator. 

First,  we  examine  whether  the  estimator  is  unbiased.  The  following  relation 

(t>= 

t=l  t=l 


shows  that,  if  (Ti)  =  0,  then  (To)  =  0  and  the  estimator  is  unbiased. 

Our  estimator  will  always  be  unbiased  if  we  consider  only  the  deviations  of  the 
measurement  from  its  average  state, 

f'  =  ti-(ti), 

because  it  will  always  have  an  average  of  zero,  since  by  definition 

(fi)^  (T<)-{(i;))=  {t,)-(r.)=o. 

The  middle  equahty  uses  the  fact  that,  for  statistically  stationary  fields,  (Ti)  is  a 
constant  (it  may  be  a  different  constant  for  different  values  of  i)  and  the  average  of 
a  constant  is  obviously  the  constant  itself.  Thus  deviations  from  an  average  state 
wiU  always  be  assumed  for  any  measurement  and  the  prime  notation  is  dropped 
for  convenience.  In  practice,  only  an  estimate  of  the  mean  field  is  available.  The 
estimated  mean  field  is  referred  to  as  the  trend. 

The  variance  of  our  estimator  is 


Since  our  estimator  is  unbiased  (i.e.,  (To)=  To),  the  variance  of  the  estimator  can 
be  written 


((T,  -  T.f  >=  ((X;  aA  -  T,f)=  <E  +  a)  -  T.p), 


(3.3) 


i=l 


i-l 
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where  we  recaJl  that  T„  is  the  true  value. 

The  unknown  weighting  coefficients,  Oj,  are  found  by  first  minimizing  the  vari¬ 
ance  in  (3.3).  Derivatives  with  respect  to  each  coefficient  aj  jrield  a  set  of  hnear 
equations  for  j  =  1, 2, . . . ,  n, 

(2[^  ai{Ti  +  e,)  -  r„][T,-  -h  c,])  =  0. 
i-\ 

(3.4.1)  can  be  written  as  either 

ai{Ti  -h  ei)(Tj  -f  e,))  =  (r„(r,  4-  Cj)), 

t=i 

or 

^ai[{TiTj)+{eiTj)+{Tiej)^eiej)\  =  (To(T, +e^)), 

i=l 

=  {Tof,). 

It  is  usually  assumed  that  there  is  no  relationship  between  the  error  field  and 
temperature  field  on  the  average,  such  that 

(r,e,)=  (riey>=  0.  (3.4.4) 

Consequently,  for  j  =  1,2, . . .  ,n,  (3.4)  becomes 

^  ail(riT,)+(eie,)l  =  (T,fy).  (3.5) 

i=l 

Since  T^,  Tj  ,  and  To  are  deviations  from  their  average  values,  (TjT,  )  is  the  co- 
variance  function  of  the  true  temperature  field  and  (ToTj  )  is  the  covariance  between 
the  temperature  of  interest  and  the  measurement  fj.  If  the  errors  are  unbiased  (i.  e., 
(ci)  =  (cj)  =  0),  then  (e»ej)  is  the  error  covariance.  Also,  since  (TiCj)  =  0,  it  is 
easily  shown  that  the  bracketed  ([  ])  terms  in  (3.5)  can  also  be  written 

(ryTy)+(eiey)=  ((T.  +  ei)(Ty  +  e,))=  (f j,),  (3.6) 

which,  for  all  i  and  j,  is  the  covariance  function  of  the  measured  temperature  field. 
Both  a  dynamical  component,  because  the  structure  of  the  covariance  function  is 
usually  determined  by  mesoscale  and/or  wave  dynamics  and  a  measurement 

component  (eiCj),  contribute  to  the  covariance  function  of  the  observations.  These 
components  are  uncoupled  by  (3.4.4). 

How  accurate  is  this  estimator?  In  our  example,  the  covariance  function  of 
the  true  temperature  field  is  also  needed  to  determine  the  error  variance  associated 
with  our  estimator.  To  see  this,  substitute  (3.6)  into  (3.5)  and  expand  to  give 


(3.4.1) 


(3.4.2) 


(3.4.3) 
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ai(tiTi)+a2(fir2)+  ...  +a„(fifn)  =  (Tifo) 


ai{TnTi^+a2(TnT2^+  ...  +<ln(^nTn)  —  {TnTo} 
which,  upon  substitution  of  the  following  matrix  definitions  (*  is  transpose), 


'TT 


r*  =  (fi,f2. 

/(tifi)  . 

\{Tnfl)  . 

a*  =  (ai,a2,. 

.  .  .  ,  ttn), 

=  {{TA),  (T.f,), ....  (T  j„)), 

and  rearrangement,  becomes 

»  =  (3.7) 

Note  that  is  the  covariance  between  the  observations.  It  is  sometimes 

more  rigorously  denoted  as  since  the  matrix  is  formed  by  the  expected  value  of 

the  outer  product  of  T  and  T*.  The  transpose  operator,  *,  is  dropped  for  notational 
convenience.  AU  the  second  arguments  in  our  notation  for  covariance  functions  are 
the  transpose  of  that  argument.  Moreover,  in  many  of  the  references,  the  second 
argument  is  dropped  when  it  is  the  same  as  the  first  argument. 

The  variance  of  our  estimator,  is  defined  to  be 


^f,fo  ~  )■ 

The  following  formula  for  estimation  variance  can  easily  be  derived. 


(3.8.1) 


The  square  root  of  (3.8.1),  or  the  standard  deviation,  is  usually  referred  to  alterna¬ 
tively  as  the  estimation  error,  mapping  error,  or  analysis  error.  (3.8.1)  may  also  be 
written  as 


-C 


-  r**  - 

r,T'-'tt  T.T' 


(3.8.2) 


The  first  term  on  the  right  hand  side  of  (3.8.1)  is  the  natural  variation  in  the 
true  temperature  at  the  point  of  interest,  To,  and  the  second  term  on  the  right 
hand  side  of  (3.8.1)  measures  the  information  content  of  the  observations.  If  the 
covariance  between  To  and  T  is  zero,  as  it  would  be  if  the  measurements  were  very 
distant  from  the  location  of  interest,  then 
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(a)  the  second  term  on  the  right  hand  side  of  (3.8.1)  and  a  in  (3.7)  are  zero, 

(b)  the  best  estimate  of  To  is  the  mean  value,  and 

(c)  the  error  variance  associated  with  our  estimate,  To,  is  the  field  variance  at 
location  O, 

The  measurements  added  no  new  information  and  the  estimation  error  is  the  square 
root  of  the  variance  of  the  temperature  field  at  the  location  of  interest.  Of  course 
in  practice,  the  true  field  variance  is  an  estimate  based  on  all  a  priori  information; 
historical  data,  concurrent  data,  and  theoretical  expectations. 

A  more  general  estimation  problem  is  the  estimation  of  the  temperature  field 
at  m  different  locations  given  n  different  measurements.  Instead  of  a  vector  of  coef¬ 
ficients,  as  we  had  in  (3.7),  we  now  need  to  find  a  m  x  n  matrix  of  coefficients.  A,  to 
estimate  the  field  at  the  m  locations  of  interest.  Define  a  vector  To  =  (Toi  ,...,Ton), 
which  is  the  true  temperature  field  at  the  locations  of  interest,  and  its  associated 
field  of  temperature  estimates  To-  Eqn.  (3.7)  generalizes  in  a  straightforward 
fashion  to 

A  =  (3-9) 

where  Ct.  .f.  and  A  are  now  m  x  n.  matrices.  The  variance  of  our  estimator  To  is 

1  o  A 


^t„T. 


=  Ct„t„  - 


(3.10) 


Eqns.  (3.9)  and  (3.10)  are  a  version  of  the  Gauss-Markov  theorem  for  a  hnear 
minimum  mean-square  estimate  of  a  random  variable.  It  should  be  noted  here  that 
the  Gauss-Markov  theorem  is  not  affected  by  replacing  the  covariance  functions 
with  covariance  functions  normalized  by  their  value  at  zero  lag  or  correlation  func¬ 
tions.  Correlation  functions  are  generally  used  in  practice  because  of  numerical 
convenience.  Thus  C  in  all  of  the  previous  formulas  can  either  denote  covariance 
or  correlation. 

In  fact,  the  matrices  associated  with  the  covariance  functions  can  undergo  an 
affine  transformation  (magnitude  change,  rotation,  translation).  Structure  functions 
(e.^.,  Gandin,  1963) 

Gtt  =  2  *  “  ^tt)  = 


0  ((Ta-T2)2) 

V((r„-T.f)  ((T.-Tsf) 


((T,  -T„)2)\ 
0  / 


(3.11) 


can  be  used.  Ctt  is  the  correlation  function.  The  advantage  of  structure  functions 
is  that  a  mean  is  not  removed  from  the  observation.  Interpolation  using  struc¬ 
ture  functions  is  known  as  Kriging  in  geostatistics  (e.^.,  Journel  and  Huijbregts, 
1978),  and  its  use  is  gaining  popularity  in  oceanography  (Denman  and  Freeland, 
1985;  Hanson  and  Herman,  1989).  The  biggest  disadvantages  of  Kriging  are  that 

(1)  the  present  algorithms  do  not  allow  for  spatial  and  temporal  correlations  and 

(2)  the  structure  function  has  considerable  curvature  near  zero  lag.  This  extreme 
curvature  near  zero  lag  hampers  estimation  of  parametric  forms  of  the  structure 
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function.  Kriging  was  formulated  for  mapping  ore  deposits  that  do  not  move  over 
the  observing  time.  When  mapping  ocean  features,  especially  in  strong  frontal 
regimes,  phase  speeds  are  needed  to  move  non-synoptic  observations  to  the  location 
where  they  would  be  at  the  estimation  time.  Also,  a  temporal  correlation  factor 
is  needed  to  account  for  the  loss  of  correlation  in  time  due  to  turbulence  and/or 
stochastic  forcing. 

The  primary  subjective  components  of  an  OA  scheme  are  the  selection  of  the 
assumed  observational  noise  level,  the  number  of  influential  data  points,  the  mean 
field,  and  the  correlation  function(s).  The  four  subjective  components  are  discussed 
next. 


The  numerical  value  of  the  error  is  given  as  a  fraction  of  total  field  variance,  typ¬ 
ically  in  the  range  of  0.05-0.50.  Fortunately,  the  accuracy  of  the  objective  analysis  is 
only  weakly  dependent  upon  the  actual  value  of  this  dilficult-to-estimate  parameter. 
On  the  average,  an  OA  estimate  at  an  observing  location  will  be  within  the  noise 
level  of  the  observed  value.  The  larger  the  assumed  noise  level,  the  smoother  the 
OA  field.  For  the  case  of  assumed  uncorrelated  noise,  the  noise  level  parameter 
appears  in  the  main  diagonal  of  the  observation’s  covariance  matrix  as  an  additive 
term.  The  assumption  of  uncorrelated  noise  is  poor  when  the  spacing  between 
observing  stations  is  much  less  than  the  distance  between  the  estimation  locations. 
For  instance,  if  you  are  making  large-scale  maps  using  stations  that  resolve  the 
energetic  mesoscale  only  in  data-rich  patches,  uncorrelated  noise  is  not  a  good  error 
model  (see  Clancy  (1983)  for  an  excellent  discussion  of  this  point).  The  assump¬ 
tion  of  uncorrelated  noise  is  also  poor  for  systems  that  measure  with  a  bias.  Bias 
reduces  the  amount  of  independent  information  that  the  observations  provide,  but 
it  enhances  gradient  calculations  (Seaman,  1977).  Biased  noise  also  puts  a  lower 
limit  on  the  mapping  error  that  does  not  exist  for  random  noise  (Gandin,  1963). 
See  Bergman  (1978)  and  Chelton  (1983)  for  further  discussion  on  correlated  errors. 

Since  the  noise  level  increases  the  magnitude  of  the  diagonal  elements  of 
a  large  enough  noise  level  will  guarantee  a  positive  semi-definite  matrix,  and  the 
numerical  inversion  of  will  be  stable.  Hence,  it  is  always  advisable  to  have 

some  nonzero  value  for  the  assumed  noise  level. 


An  OA  estimate  is  essentially  a  weighted  average  of  observations  with  the 
weights  chosen  by  eqn.  (3.7).  The  number  of  observations  (“limit  )  that  influence 
the  OA  estimate  must  also  be  chosen.  The  maximum  number  of  points  to  use 
in  this  weighted  average  is  the  “limit,”  which  typically  ranges  from  5-20  points. 
On  the  one  hand,  large  values  of  “hmit”  improve  the  statistical  confidence  in  the 
estimate  because  of  a  large  number  of  degrees  of  freedom.  On  the  other  hand, 
larger  than  necessary  values  of  “limit”  do  not  improve  the  accuracy  of  the  OA 
estimate  and  may  smear  features  in  heterogeneous  regions.  (Note,  as  indicated 
by  eqn.  (3.9),  that  a  n  x  n  matrix  must  be  inverted  for  each  estimation  point; 
n  is  an  upper  bound  on  “limit.”)  While  smaller  values  of  hmit  lead  to  faster 
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computation,  the  OA  estimates  under  these  circumstances  are  more  sensitive  to 
noise  in  the  observations.  K  the  correlation  representation  is  nearly  singular,  a 
relatively  small  value  of  “limit”  may  be  advantageous  since  the  inversion  of  a  small 
matrix  is  less  likely  to  be  numerically  singular  than  a  large  one.  Some  trial  and 
error  experimentation  with  the  value  of  “limit”  is  a  recommended  part  of  the  early 
implementation  of  an  OA.  For  our  analysis,  the  number  of  influential  data  points 
equals  8. 

Ideally,  the  best  influential  data  points  are  those  data  points  most  correlated 
with  the  estimation  location  and  leaist  correlated  amongst  themselves.  The  latter 
requirement  can  be  relaxed  by  averaging  close  and  highly  correlated  points  before 
the  selection  of  influential  data  points.  Then  only  the  former  requirement  of  the 
most  correlated  data  points  with  the  estimation  location  are  used  to  select  influential 
data  points.  The  most  correlated  data  points  are  found  by  sorting  the  absolute  value 
of  the  correlations.  The  absolute  value  is  sorted  since,  for  example,  a  correlation  of 
—0.9  is  more  desirable  than  a  correlation  of  0.1. 


Our  approach  for  decomposing  data  containing  both  large  scale  and  mesoscale 
components  is  to  remove  the  mean  of  the  data  before  the  OA,  and  to  objectively 
interpolate  just  the  deviations  from  the  mean.  The  final  field  estimate  is  the  sum  of 
the  mean  and  the  OA  of  the  deviation  field.  In  practice,  the  mean  field  is  a  function 
of  longitude,  latitude,  depth,  and  time,  and  is  either  a  trend,  climatology,  a  forecast 
from  a  dynamical  model,  or  a  previous  field  estimate.  Common  trends  are  either 
a  constant  average,  linear  trends,  polynomial  surfaces,  splines,  or  different  sets  of 
basis  functions.  A  frequently  used  name,  first-guess  field,  sums  up  the  situation 
nicely. 

OA  works  best  for  interpolation.  Extrapolation  by  OA  or  any  other  technique 
can  be  risky.  For  instance,  an  OA  might  yield  unreahstic  streamhnes  running  in 
and  out  of  a  coastline  or  temperature  values  that  are  far  from  the  historical  range. 
These  problems  are  usually  caused  by  sparse  boundary  data,  which  can  affect  the 
analysis  at  two  different  stages.  First,  a  spatial  trend  calculated  from  sparse  and/or 
patchy  data  may  not  represent  the  true  trend  over  the  entire  analysis  domain.  For 
example,  a  plane  fit  to  the  data  in  a  region  that  contains  a  front  can  give  unrealistic 
estimates  near  the  edge  of  the  analysis  domain  away  from  the  front.  The  second 
problem  with  sparse  data  is  that  only  a  simple  few-parameter  correlation  function 
can  be  estimated  from  the  data.  Based  on  these  considerations,  a  tense  bicubic 
smoothing  spline  was  chosen  to  represent  the  mean  field  in  our  analysis. 

Inoue’s  least-square  finite-element  sphnes  were  chosen  because  they  have  ad¬ 
justable  smoothness  and  tension  parameters,  allow  variable  data  errors,  and,  for 
computational  efficiency,  they  use  a  B-spline  basis  and  iterate  with  binary  subdivi¬ 
sions  (Inoue,  1986).  Because  of  the  rich  large-scale  structure  evident  in  sea  surface 
velocity  fields,  a  trend  consisting  of  a  simple  mean  or  two-dimensional  polynomial  fit 
produces  unreahstic  maps  for  large  analysis  domains.  MB92  used  two-dimensional 
bicubic  sphnes  for  the  trend  surface  in  the  interpolation  of  sea  surface  temperature 
in  the  tropical  Pacific  from  30"5  to  30°N  with  good  results.  The  use  of  tense 
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splines  allows  reasonable  extrapolation  from  strong  frontal  regions  with  large  spa- 
tisl  gradients  to  more  homogeneous  regions.  The  spline  parameters  chosen  were  the 
(inverse)  smoothing  parameter  p  =  10"^,  the  tension  parameter  t=0.99,  and  the 
initial  number  of  knots  for  the  large  basins,  mxO  =  myO  =  60. 


Typical  correlation  functions  are  in  the  form  of  an  oscillatory  term  multiplied 
by  a  decay  term.  The  oscillatory  term  models  wave  behavior  and  is  usually  a 
trigonometric  function  or  a  Bessel  function.  The  decay  term  models  turbulent 
behavior  and  is  usually  an  exponential  (t.c.,  Gaussian)  function  or  inverse  poly¬ 
nomial.  Other  representative  functions  are  described  in  Bretherton  ei  al.  (1976), 
Carter  and  Robinson  (1987),  Juhan  and  Thiebaux  (1975),  Freeland  and  Gould 
(1976),  McWilliams  et  al.  (1986),  Thiebaux  (1975, 1976),  and  Thiebaux  and  Pedder 
(1987).  A  functional  form  for  the  correlation  function  is  assumed  in  the  parameter 
matrix  algorithm  with  the  novelty  of  having  nine  correlation  parameters  that  can 
vary  in  space  and  time.  The  parameter  matrix  algorithm  uses  an  anisotropic,  time- 
dependent  correlation  function  with  correlation  parameters  that  vary  in  space/time, 
and  a  time- dependent  trend  surface  for  efficient  OA  of  dynamically  heterogeneous 
and  non-stationary  fields.  An  observation  (subscript  o)  is  decomposed  into  three 
components. 


To{x,y,t)  =Tr„(®,y,t)  +  Te(a:,y,t)  +  e,(x,y,t),  (3.12) 

where  the  first  term  is  the  contribution  of  the  large  scale  or  trend  field  (subscripted 
m  for  mean),  the  second  term  is  the  natural  field  variability  important  on  the 
mesoscale  or  synoptic  time  scale  (subscripted  e  for  eddy),  and  the  third  term  is  the 
combined  effects  of  unresolved  scales,  i.e,  subgrid-scale  noise,  and  error  from  the 
particular  sensor  (subscript  s  for  subgrid-scale  and  sensor  error). 

Tm  was  estimated  by  least-square  fitting  of  a  bicubic  spline  surface  to  the  data. 
The  anisotropic  and  time-dependent  correlation  model  for  estimating  Tg  from  the 
detrended  data  (To  —  Tm)  is 

C{dx,dy,dt)  =  C(l)[l.  -  {DX/C{4))^  -  {DY/C{5))^]* 

exp -[{DX/C{6))^  +  {DYIC{7)f  +  {dt/C{S))%  (3.13) 


DX  =  dx-  C{2)  *  dt  and  DY  =  dy  -  C(3)  *  dt, 

where, 

•  dx  is  the  east-west  lag;  dy  is  the  north-south  lag;  and  dt  is  the  time  lag. 

•  (7(1)  is  the  correlation  at  zero  lag  and  equals  one  minus  the  normaUzed  (by 
the  field  variance)  measurement  variance  (which  includes  a  subgrid  scale  com¬ 
ponent). 

•  (7(2)  and  (7(3)  are  the  mean  phase  speeds  in  the  east-west  and  north-south 
direction,  respectively. 
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•  C7(4)  and  C7(5)  are  the  zero-crossing  scales  in  the  east-west  and  north-south 
direction,  respectively. 

•  C?(6)  and  C{7)  are  the  spatial  decay  scales  (or  the  e-folding  scales)  in  the 
east- west  and  north-south  direction,  respectively. 

•  C(8)  is  the  temporal  decay  scale. 

This  correlation  function  can  be  rotated  in  space  by  (7(9),  an  arbitrary  angle.  This 
correlation  function  aind  its  isotropic  form,  C(4)  =  C{5)  and  C7(6)  =  C'(7),  have  been 
used  in  a  variety  of  dynamical  regimes  (e.y..  Carter  and  Robinson,  1987;  Robinson 
et  al,  1987;  and  MB92). 

The  parameter  matrix  algorithm  starts  by  dividing  both  the  interpolation  grid 
and  data  domain  into  the  same  rectangular  space/time  bins.  The  dimensions  of  the 
bins  are  based  on  the  correlation  scales  and  how  the  correlation  parameters  change 
in  space  and  time:  (1)  each  bin  must  contain  one  set  of  correlation  parameters  and 
(2)  all  the  data  in  a  bin  should  be  significantly  correlated  with  the  midpoint  of  the 
bin  to  minimize  computations.  The  parameter  matrix  is  formed  by  assigning  the 
values  of  the  nine  correlation  parameters  to  each  numbered  bin. 

The  present  parameter  matrix  algorithm  consists  of  the  following  steps: 

•  0)  Highly  correlated  data  are  averaged. 

•  1)  Each  data  point  is  assigned  a  bin  number. 

•  2)  The  data  points  are  sorted  by  bin  number  so  that  all  the  data  for  each  bin 
are  stored  together  with  their  corresponding  position,  time,  detrended  value 
and  error  level. 

•  3)  A  marker  array  is  used  to  record  the  array  position  where  each  bin  starts 
(or  ends). 

•  4)  The  OA  is  performed  one  interpolation  point  at  a  time. 

•  5)  The  bin  number  of  the  interpolation  point  is  found  and  denoted  by  N . 

•  6)  Data  in  bin  N  and  all  bins  within  an  influential  correlation  window  from 
the  interpolation  point  are  selected  for  determining  the  influential  data  points. 

•  7)  The  correlations  between  the  interpolation  point  and  each  data  point  from 
step  6  are  tabulated  and  sorted. 

•  8)  The  most  influential  data  points,  i.e.,  the  most  correlated  with  the  interpo¬ 
lation  point,  are  used  for  the  OA  and  usually  number  between  6  and  20  data 
points. 

•  9)  Steps  5-8  are  repeated  for  each  interpolation  point. 

If  a  data  point  is  in  one  bin  and  the  interpolation  point  is  in  another  bin,  all 
the  correlation  parameters  from  the  two  bins  are  averaged  except  the  phase  speeds. 
The  phase  speeds  of  the  data  bin  are  used  since  it  is  desirable  to  move  the  data 
points  to  where  they  would  be  at  the  estimation  time.  For  calculating  correlation 
between  data  in  different  bins,  all  the  correlation  parameters  (including  the  phase 
speeds)  are  averaged. 

The  algorithm  allows  two  types  of  random  errors  to  contribute  to  the  diagonal 
elements  of  Cff  and  enter  the  calculation  of  T.  An  environmental  error  due  to 
subgrid-scale  processes  is  assigned  to  C’(l)  in  the  parameter  matrix  and  a  sensor 
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error  is  assigned  to  each  data  point.  For  these  calculations,  C{1)  =  0.9,  and  the 
sensor  error  is  assumed  to  be  0.1  so  that  the  total  normahzed  error  level  is  0.2. 

The  exact  error  is  hard  to  estimate  because  of  the  unknown  and  hard-to-model 
effects  of  winds  and  waves  in  conjunction  with  ship  shape  and  size.  Another  un¬ 
known  factor  is  the  skill,  or  lack  of  it,  of  the  navigators.  Richardson  and  McKee 
(1984)  estimated  a  random  error  for  a  single  ship  drift  speed  to  be  20  cm/s.  System¬ 
atic  and  bias  errors  in  dead-reckoning  will  cancel  out  in  the  velocity  calculations, 
but  the  systematic  error  due  to  windage  will  not.  By  comparing  ship  drift,  drifting 
buoys,  and  current  meters,  McPhaden  et  al.  (1991)  estimated  an  upper  bound  of 
windage  effects  to  be  3  percent  of  the  surface  wind  speed.  For  SAR  applications, 
the  windage  bias  is  probably  not  a  problem  and  may  actually  help  when  looking 
for  distressed  ships.  On  the  other  hand,  windage  effects  downgrade  the  ship  drift 
velocity  estimates  for  comparisons  with  ocean  circulation  models. 

Wind  speed  information  was  not  used  in  the  calculation  of  the  error  level. 
Given  a  range  of  speeds  from  0  to  200  cm/s,  typical  maximum  speeds  of  order  100 
cm/s,  and  that  the  random  error  component  for  single  ship  drift  estimate  is  20  cm/s 
(Richardson  and  McKee,  1984)  leads  to  an  average  error  on  the  order  of  the  assumed 
20  percent  value  (  loVcm/i  ^  )•  future  calculations  may  want  to  assume  that 

this  error  is  on  the  order  of  10  percent  in  the  western  boundaries  and  on  the  order 
of  50  percent  in  mid-ocean  regions  where  typical  eddy  speeds  are  40-60  cm/s.  One 
fact  should  not  be  overlooked.  In  all  of  the  quoted  studies,  the  investigators  were 
surprised  at  how  much  useful  information  was  obtained  on  basin-wide  circulation 
patterns  and  their  seasonal  variability  after  averaging  ship  drift  data.  Though  the 
data  are  very  noisy,  the  law  of  large  numbers  and  a  data  set  consisting  of  over  four 
million  data  points  lead  to  stable  estimates  of  seasonal  ocean  currents. 

The  following  spatial  scales,  in  units  of  degrees  of  longitude  for  the  x  direction 
and  degrees  of  latitude  for  the  y  direction,  were  chosen  based  on  data  density 
considerations  and  because  the  vast  majority  of  the  data  resolve  scales  on  the  order 
of  300-400  km.  Dynamical  considerations  would  lead  to  smaller  correlation  scales 
and  more  structure  in  the  final  field  estimates.  Since  the  bicubic  splines  capture 
almost  all  of  the  large-scale  heterogeneity  of  the  surface  velocity  fields,  the  spatial 
scales  were  chosen  to  be  isotropic.  The  temporal  scale,  in  units  of  days,  was  chosen 
to  be  large  since  a  historical  average  of  data  from  many  different  years  is  wanted 
(not  the  typical  OA  synoptic  scale  estimate  using  just  one  month  of  data). 

G(l)  =  0.9,  C{2)  =  C(3)  =  ^7(9)  =  0.0,  C(4)  =  C(5)  =  3.0, 


C(6)  =  C{7)  =  2.0,  C(8)  =  99999. 

The  parameter  matrix  algorithm  results  are  presented  in  Section  4. 
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4.  Mariano  Global  Surface  Velocity  Analysis  1.0 


For  the  two  largest  basins,  the  Pacific  Ocean  and  the  Atlantic  Ocean,  the 
largest  amount  of  monthly  data  was  in  the  Atlantic  for  the  month  of  May,  a  total 
of  182,938  good  data  points,  and  the  smallest  amount  of  monthly  data  was  in  the 
Pacific  for  the  month  of  December,  a  total  of  49,562  good  data  points  (Table  1).  The 
OA  maps  and  normalized  mapping  error  plots  for  these  two  data  density  extremes 
are  presented  in  Figs.  2  and  3. 

Seasonal  and  yearly  averages  of  the  data  are  shown  in  Figs.  4-8.  These 
velocity  maps  contain  many  weU-known  ocean  features.  For  instance,  the  major 
western  boundary  currents-the  Gulf  Stream,  the  Brazil  Current,  the  Kuroshio,  E. 
Australian,  Somali,  Algulhas,  Labrador  Current,  Malvanus,  and  the  Kamchatka 
current,  as  well  as  the  equatorial  current  system-North  Equatorial  Current  (NEC), 
North  Equatorial  Counter  Current  (NECC),  and  the  South  Equatorial  Current 
(SEC)  in  all  three  major  ocean  basins,  are  clearly  visible  in  the  annual  maps  (Fig. 
4).  Major  eastern  boundary  currents— Guinea,  Angula,  Benguela,  California,  Peru, 
and  the  Leeuwin  currents-  are  also  clearly  visible. 

The  seasonal  averages— DJF,  MAM,  JJA,  and  SON  (Figs.  5-8)—  show  the  well- 
known  reversal  of  the  NEC  in  the  Indian  Ocean  during  the  onset  of  the  Southwest 
Monsoon  and  the  retroflection  of  the  N.  Brazil  current  into  the  NECC  in  the  latter 
half  of  the  year  (see  Richardson  and  Walsh,  1986).  This  agreement  with  well-known 
ocean  circulation  regimes  proves,  at  least  quaUtatively,  that  these  maps  would  be 
of  practical  use  to  the  USCG. 

The  reliability  of  these  maps  was  further  studied  in  a  visual  comparison  with 
the  present  historical  velocity  file  in  operational  use.  Based  on  these  comparisons, 
it  was  quite  clear  to  R&D  Ctr  that  this  data  product  was  far  superior  to  the  overly 
smooth  data  base  currently  in  use.  The  monthly  space/time  interpolated  velocity 
fields  were  termed  the  Mariano  Global  Surface  Velocity  Analysis,  hereafter  MGSVA. 
If  anything,  some  smoothing  of  the  MGSVA  in  some  areas  may  be  needed. 

The  statistics  of  the  differences  between  the  Atlantic  and  Indian  basins'  OA 
estimates  at  the  boundary  are  shown  in  Fig.  9.  These  calculations  were  performed 
for  the  area  10-30°  E,  30-70°  S,  south  of  South  Africa,  for  the  month  of  January. 
Figs.  9a  and  9b  are  the  histogram  of  the  velocity  differences  between  the  Atlantic 
and  Indian  estimates  for  the  u  and  v  velocity  component,  respectively.  These  dif¬ 
ferences  are  not  biased.  Fig.  9c  is  a  histogram  of  the  differences  in  speed  between 
the  Atlantic  and  Indian  estimates.  Fig.  9d  shows  the  vector  differences  between 
the  two  estimates.  Over  90  percent  of  the  speed  estimates  agree  to  within  30  ctti/s 
(Fig.  9c)  and  there  is  no  clear-cut  directional  bias  (Fig.  9d).  This  is  encouraging 
since  most  of  the  estimates  compared  here  are  from  the  data-sparse  areas  with  the 
largest  estimation  error. 

There  are  no  obvious  biases  between  the  two  different  sets  of  estimates.  Con¬ 
sequently,  the  two  sets  of  estimates  for  each  month  and  for  each  seam  should  be 
averaged  together  using  weights  that  are  inversely  proportional  to  each  error  esti¬ 
mate  and  normahzed  such  that  their  sum  is  one.  For  example,  let  ui  be  the  velocity 
estimate  associated  with  region  1  and  let  ej  be  the  estimation  variance  associated 
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Mariano  and  Ryan 

2(a) 


Fig.  2  (a)  The  velocity  scale  as  a  function  of  latitude.  (The  software  for  the  curly 
vector  plots  was  developed  at  the  Navy  Research  Lab  South.)  (b)  The  OA  map 
for  the  Atlantic  Ocean  for  the  month  of  May.  (c)  The  corresponding  error  map. 
(Note  that  most  of  the  basin,  except  for  the  sub-polar  and  polar  regions,  has  low 
estimation  error.) 
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Fig.  3  (a)  The  OA  map  for  the  Pacific  Ocean  for  the  month  of  December,  (b)  The 
corresponding  error  map.  (Note  that  the  North  Pacific  has  much  lower  estimation 
error  than  the  South  Pacific.) 
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Fig.  4  (a-d)  The  weighted  annual  average  of  the  monthly  surface  velocity  estimates. 
The  weights  were  inversely  proportional  to  the  error  estimates  associated  with  each 
monthly  estimate. 
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Fig.  5  (a-d)  The  weighted  seasonal  average  of  the  monthly  surface  velocity  estimates 
for  the  months  of  December,  January,  and  February. 
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Fig.  6  (a-d)  The  weighted  seasonal  average  of  the  monthly  surface  velocity  estimates 
for  the  months  of  March,  April,  and  May. 
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Fig.  7  (a-d)  The  weighted  seasonal  average  of  the  monthly  surface  velocity  estimates 
for  the  months  of  June,  July,  and  August. 
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Fig.  9  (a)  The  histogram  of  the  east- west  velocity  component,  u,  differences  between 
the  Atlantic  minus  the  Indian  basins,  (b)  Same  as  (a)  but  for  v.  (c)  A  histogram  of 
the  speed  differences,  ^/(u Atlantic  ~  '^indianY  +  Atlantic  ^  (d)  A  polar 

plot  of  the  vector  differences  between  the  Atlantic  and  Indian  basin  velocities. 


with  iii .  Let  «2  and  62  be  the  corresponding  variables  for  the  second  region.  Then 
the  final  estimate,  n/,  in  the  data  overlap  regions  should  be  given  by 

(e2Ui  +  611*2) 

“  (e,  +  €2)  ■ 

A  giTnila.r  expression  is  used  for  v.  The  final  blending  of  the  data  in  the  overlap 
regions  is  the  responsibility  of  the  USCG  R&D  Ctr. 

In  addition  to  the  velocity  estimates,  the  corresponding  uncertainties,  Cu  and 
(T®,  were  also  estimated.  Since  the  OA  scheme  usually  calculates  estimation  er¬ 
rors  for  the  problem  of  estimating  fields  at  a  given  estimation  time,  and  since  our 
estimates  here  represent  historical  averages,  the  typical  error  formula  (c.^.,  eqn. 
(3.8))  was  modified.  For  each  bin,  month,  and  velocity  component,  the  maximum 
normalized  error  level,  given  by  eqn.  (3.8),  was  calculated,  Cmax-  At  each  grid  point 
t,  the  standard  deviation  of  the  original  data,  <Ti,  was  calculated  from  the  sphne 
fits  to  the  standard  deviation  fields.  Let  the  normahzed  error  level  given  by  eqn. 
(3.8.1)  be  denoted  by  c.  Then  the  error  of  our  estimates  for  each  component  was 
estimated  by 

c(o't/ Cfnax)- 

The  final  velocity  fields  were  smoothed  by  a  first-order  Shapiro  filter  (Shapiro, 
1970)  to  remove  one-point  grid  noise  for  mapping  purposes. 
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5.  Limitations  and  Recommendations 

The  MGSVA  was  constructed  on  a  volunteer  basis  from  a  noisy,  unedited,  but 
very  large  data  set  using  existing  software  developed  for  another  project.  Conse¬ 
quently,  these  fields  can  be  improved  in  a  number  of  ways  by  generalizing  our  apph- 
cation  of  the  parameter  matrix  objective  analysis  algorithm  and  the  data  processing 
steps.  This  suggestion  is  discussed  after  the  discussion  of  the  inherent  hmitations 
of  Eulerian-based  field  estimates  for  USCG  operational  use. 

Ocean  current  systems  meander  or  change  their  positions  on  a  variety  of  time 
scales.  Coherent  circtJation  features,  such  as  Gulf  Stream  warm-core  rings,  frontal 
eddies,  planetary  waves,  etc.,  also  propagate  over  time.  It  is  a  well-known  fact 
that  Eulerian-based  estimates  will  smear  circulation  features  and  that  consequently, 
these  estimates  are  too  slow  (Mariano,  1990;  Mariano  and  Chin,  1995).  Halkin 
and  Rossby  (1985)  demonstrated  that  peak  Gulf  Stream  velocities  in  an  Eulerian- 
based  average  of  16  PEGASUS  velocity  transects  were  significantly  lower  than  peak 
velocities  in  a  stream-based  average  of  the  same  data,  100  cm/ s  versus  170  cm/s. 
Another  example  of  smearing  in  Eulerian  averaging  is  shown  in  Fig.  10,  where 
two  estimates  of  the  Gulf  Stream  front,  a  warm-  and  cold-core  ring,  are  averaged 
using  a  simple  arithmetic  average  with  equal  weights  of  1/2.  This  is  the  simplest 
example  that  is  relevant  to  this  OA  method.  Note  that  the  field  values  are  reduced 
by  a  factor  of  1/3  between  the  two  data  reahzations.  The  contour-based  averaging 
approach  of  Mariano  (1990)  shows  a  more  pleasing  estimate.  The  construction  of 
contour-based  software  for  averaging  the  MMSDD  would  be  a  daunting  task  because 
of  the  sophisticated  pattern  recognition  that  is  needed  for  contour-based  averaging 
and  is  not  recommended  at  this  time. 

A  far  more  economical  approach  is  to  compare  actual  drift  velocities  calculated 
in  the  field  and  to  compare  these  to  our  estimates  and  then  tabulate  the  amount 
of  underestimation.  A  smooth  regional  map  of  a  correction  factor,  which  may  be  a 
function  of  wind  speed,  could  then  be  apphed  to  the  OA  estimates  to  correct  some 
of  the  underestimation  problems  of  least-square  Eulerian-based  estimates. 

Based  on  the  lead  author’s  experience  with  feature-based  estimation,  it  is  sug¬ 
gested  that,  in  Computer  Aided  Search  Planning  (GASP)  2.0  or  its  successor,  a 
feature-based  combination  of  real-time  and  historical  data  would  lead  to  a  major 
improvement  in  SAR  activities  because  many  ocean  features  have  a  fairly  rigid 
cross-feature  structure.  For  example,  in  the  Gulf  Stream,  two  nearby  drifters  yield 
two  estimates  of  velocities  and  an  estimate  of  a  spatial  derivative.  This  would  tell 
you  what  side  of  the  Gulf  Stream  you  are  on  and,  by  translating  the  historical  Gulf 
Stream  cross-stream  velocity  structure  so  that  it  matches  the  in  situ  data,  would 
provide  a  much  better  estimate  of  the  true  velocity  field  than  either  an  OA  map  of 
the  drifter  velocity  or  the  MGSVA  data  by  itself  or  any  Eulerian-based  scheme.  As 
Fig.  10  illustrates,  a  simple  average  of  the  two  will  lead  to  undesirable  results,  and 
a  feature-based  approach  should  be  tried. 

As  demonstrated  by  Robinson  and  collaborators  (Robinson  et  aZ.,  1989;  Spall 
and  Robinson,  1990),  feature  models  can  be  used  with  satelhte  data  for  rehable 
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Fig.  10  (a)  An  estimate  of  the  streamfunction  field  for  the  Gulf  Stream  region,  (b) 
Same  as  (a)  but  another  field  estimate,  (c)  Using  the  classic  optimal  estimator  for 
combining  fields  from  (a)  and  (b).  (d)  Using  contour  analysis. 
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prediction  of  ocean  currents.  For  example,  Advanced  Very  High  Resolution  Ra- 
diometry  (AVHRR)  data  in  the  Gulf  Stream  can  convey  the  position  of  the  stream 
and  the  rings.  Analytical  three-dimensional  models  of  the  circulation  associated 
with  these  features  can  be  combined  with  other  data. 

In  addition  to  the  incorporation  of  feature-based  analysis,  a  number  of  possible 
generalizations  in  the  OA  of  these  data  can  be  tried.  Xhese  can  be  broadly  char¬ 
acterized  into  two  separate  research  avenues;  improvements  to  the  input  data  sets 
and  improvements  to  the  OA  algorithm.  The  input  data  set  can  be  improved  by 
assigning  error  bars  to  the  individual  drifts  since  our  OA  algorithm  allows  this.  For 
instance,  earlier  estimates  wo\dd  be  given  larger  bars  based  on  progress  in  naviga¬ 
tion  technology.  The  input  data  can  also  be  improved  by  incorporating  the  newest 
global  drifting  buoy  data  set  from  the  Tropical  Ocean  Global  Atmosphere  (TOGA) 
experiment,  the  World  Ocean  Circulation  Experiment  (^VOCE),  and  other  major 
ocean  field  experiments,  and  finding  the  missing  ship-drift  estimates.  This  should 
be  done  as  quickly  as  possible,  especially  in  the  data-sparse  southern  oceans.  Given 
the  planned  experiments  in  the  southern  ocean,  new  data  would  certainly  lead  to 
major  improvements  in  surface  velocity  estimates  south  of  50°  S. 

Another  potential  research  avenue  that  may  lead  to  major  improvement  in  this 
product  for  SAR  activities  would  involve  first  binning  the  data  by  wind  speeds  and 
directions.  Velocity  would  then  be  estimated  as  a  function  of  longitude,  latitude, 
and  wind  regime.  Visual  inspection  of  surface  drifters  in  a  Mineral  Management 
Service  video  (courtesy  of  Walter  Johnson)  show  a  strong  correlation  between  wind 
direction  and  drifter  movement.  It  is  very  evident  from  drifter  trajectories  in  this 
video  and,  given  our  estimates  for  this  region,  that  the  inherent  variability  between 
different  wind  regimes  will  lead  to  very  sub-optimal  SAR  drift  predictions  during 
anomalous  wind  conditions.  This  is  unavoidable  with  our  historical  analysis  because 
of  the  strongly  stochastic  nature  of  both  the  ocean  and  the  atmosphere.  It  is  highly 
recommended  that  this  research  avenue  be  given  high  priority. 

The  most  obvious  extension  to  our  OA  algorithm  would  be  the  use  of  a  multi¬ 
variate  OA  scheme  that  incorporates  the  cross-correlations  between  u  and  v.  Carter 
and  Robinson  (1987)  showed  that  their  multi-variate  approach  improved  velocity 
estimates  in  the  thermocline  waters  in  the  NW  Atlantic  over  scalar  OA  techniques 
using  the  POLYMODE  data  set.  This  improvement  was  expected  for  subsurface 
thermochne  flow  where  geostrophic  dynamics  lead  to  a  significant  cross-correlation 
between  u  and  v.  Our  analysis  indicates  significant  horizontal  divergences  and 
convergences  in  some  areas  that  would  cause  smaller  correlations  between  u  and 
V.  Thus,  the  calculation  of  regional  and  monthly  cross-correlations  between  u  and 
V  should  be  given  high  priority  so  that  the  issue  of  scalar  versus  multi-variate  OA 
scheme  for  surface  velocity  estimates  can  be  settled. 

OA  also  requires  the  specification  of  a  mean  and  covariance  function  at  each 
of  the  estimation  points.  This  specification  is  the  most  subjective  component  of  an 
OA  scheme.  A  tense,  smooth,  two-dimensional  bicubic  sphne  was  used  to  represent 
the  mean  field  for  each  variable  u  and  v.  The  sphne  was  fitted  to  the  monthly  data 
sets  in  all  of  the  basins  except  for  the  Antarctic,  where  data-sparsity  dictated  just 
one  annual  analysis.  These  sphnes  have  been  used  by  the  lead  author  extensively 
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and  have  performed  well  for  global  SST  estimation  (MB92).  Further  experiments 
with  the  adjustable  spline  parameters  is  needed,  but  is  not  crucial.  The  choice  of 
other  mean  surfaces  would  probably  not  improve  the  velocity  estimates. 

Because  oidy  a  finite  number  of  data  points  are  used  in  estimating  the  cor¬ 
relation  functions,  correlation  function  estimates  are  the  convolution  of  the  true 
correlation  function  with  a  sine  function.  Thus,  with  limited  data,  it  is  hard  to 
determine  the  correct  form  of  the  correlation  function.  Work  by  Thiebaux  (1975, 
1976)  suggests  that  if  a  fairly  general  correlation  function  is  used  and  fitted  to  the 
data,  it  would  be  hard  to  improve  on  it  unless  more  data  become  available.  So 
the  choice  of  another  correlation  function  would  probably  not  improve  the  velocity 
estimates,  but  using  the  full  power  of  the  parameter  matrix  may.  This  would  require 
estimation  of  the  nine  correlation  parameters  in  each  5“  x  5“  bin  for  each  basin  for 
each  month.  This  computer-intensive  component  should  be  first  tested  in  a  region 
that  is  both  of  high  priority  to  the  USCG  and  that  contains  numerous  good  quahty 
data  sets  for  comparisons.  The  latter  requirement  for  a  large-scale  region  would 
certainly  suggest  the  tropical  Pacific,  seeded  with  numerous  drifters  since  the  1980s 
and  with  a  large  increase  in  data-density  starting  in  the  early  1990s.  A  small-scale 
region,  also  of  interest  to  the  International  Ice  Patrol,  is  the  NW  Atlantic  Ocean. 

In  the  near  future,  assimilation  of  either  the  edited  data  or  the  mapped  fields 
into  numerical  circulation  models  forced  by  real-time  winds  is  recommended  for 
the  long-term  goal  of  accurate  and  reliable  sea  surface  velocity  estimates  that  are 
needed  for  USCG  activities.  Robinson  and  his  research  group  at  Harvard  University 
have  shown  that  this  is  feasible,  given  the  necessary  resources.  A  group  at  the 
National  Weather  Center  is  working  with  Mellor  (Princeton)  on  such  a  system 
for  the  East  Coast.  Rosenstiel  School  of  Marine  and  Atmospheric  Science  Ocean 
Pollution  Research  Center  at  the  University  of  Miami  is  experimenting  with  a  very 
high  resolution  numerical  circulation  model  in  the  Florida  Straits  area  and  has  been 
supplying  forecasts  to  the  Brother  to  the  Rescue  for  finding  Cuban  rafters  lost  at 
sea.  In  the  long  run,  although  this  is  more  costly,  it  is  the  best  approach.  In  the 
interim,  we  hope  that  the  use  of  the  MGSVA  in  USCG  SAR  operations  will  lead  to 
more  optimal  searching  strategies,  save  hves,  and  reduce  fuel  consumption  through 
more  efficient  SAR  flights. 
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AVHRR 

EKE 

MGSVA 

MMSDD 

MSDD 

NAVOCEANO 

NEC 

NECC 

OA 

RSMAS 

SAR 

SEC 

SST 

USCG 


Advanced  Very  High  Resolution  Radiometry 

Eddy  Kinetic  Energy 

Mariano  Global  Surface  Velocity  Analysis 

Monthly  Maury  Ship  Drift  Data  base 

Maury  Ship  Drift  Data  base 

U.S.  Naval  Oceanographic  Office 

North  Equatorial  Current 

North  Equatorial  Counter  Current 

Objective  Analysis 

Rosenstiel  School  of  Marine  and  Atmospheric  Science 

Search  and  Rescue 

South  Equatorial  Current 

Sea  Surface  Temperature 

U.S.  Coast  Guard 
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